Goto

Collaborating Authors

 stochastic cubic regularization


Stochastic Cubic Regularization for Fast Nonconvex Optimization

Neural Information Processing Systems

This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak]. The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(\epsilon^{-3.5})$ stochastic gradient and stochastic Hessian-vector product evaluations. The latter can be computed as efficiently as stochastic gradients. This improves upon the $\mathcal{\tilde{O}}(\epsilon^{-4})$ rate of stochastic gradient descent. Our rate matches the best-known result for finding local minima without requiring any delicate acceleration or variance-reduction techniques.


Stochastic Cubic Regularization for Fast Nonconvex Optimization

Neural Information Processing Systems

This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak]. The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only \mathcal{\tilde{O}}(\epsilon {-3.5}) stochastic gradient and stochastic Hessian-vector product evaluations. The latter can be computed as efficiently as stochastic gradients. This improves upon the \mathcal{\tilde{O}}(\epsilon {-4}) rate of stochastic gradient descent. Our rate matches the best-known result for finding local minima without requiring any delicate acceleration or variance-reduction techniques.


Reviews: Stochastic Cubic Regularization for Fast Nonconvex Optimization

Neural Information Processing Systems

This submission is interested in stochastic nonconvex optimization problems, in which only stochastic estimates of the objective and its derivatives can be accessed at every iteration. The authors develop a variant of the cubic regularization framework, that only requires access to stochastic gradients and products of stochastic Hessians with vectors. Such a method is shown to reach a point at which the gradient norm is smaller than \epsilon and the minimum Hessian eigenvalue is bigger than -\sqrt{\rho \epsilon} in a number of stochastic queries (gradient or Hessian-vector product) of order of \epsilon {-3.5}, which improves over the classical complexity of Stochastic Gradient Descent (SGD). The problem of interest is clearly introduced, along with the possible advantages of using both stochastic estimates and a cubic regularization framework. The associated literature is correctly reviewed, and the authors even cite contemporary work that achieves similar complexity guarantees but rely on stochastic gradient estimates and variance reduction.


Stochastic Cubic Regularization for Fast Nonconvex Optimization

Tripuraneni, Nilesh, Stern, Mitchell, Jin, Chi, Regier, Jeffrey, Jordan, Michael I.

Neural Information Processing Systems

This paper proposes a stochastic variant of a classic algorithm---the cubic-regularized Newton method [Nesterov and Polyak]. The proposed algorithm efficiently escapes saddle points and finds approximate local minima for general smooth, nonconvex functions in only $\mathcal{\tilde{O}}(\epsilon {-3.5})$ stochastic gradient and stochastic Hessian-vector product evaluations. The latter can be computed as efficiently as stochastic gradients. This improves upon the $\mathcal{\tilde{O}}(\epsilon {-4})$ rate of stochastic gradient descent. Our rate matches the best-known result for finding local minima without requiring any delicate acceleration or variance-reduction techniques.